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I. INTRODUCTION 


The availability of simplified protein models with reduced degrees of freedom is useful for 
studying several biophysics problems. For example, the study of conformational changes in 
large protein systems is still unfeasable even on the fastest computer^. Conversely, with a 
reduced model it could be possible to study the thermodynamics of a 341-residues protein 
in a crowded environmenlP. Free-energy differences upon mutation can be calculated ab 
initio only for small systems, while in more challenging cases one must resort to ad-hoc 
potentiald^l. The elimination of solvent molecules is a standard example in which the use of 
a simplified model allows to study large and complex system^. Anyway, the main problem 
associated with the reduction of the number of degrees of freedom in physical systems is the 
design of an effective potential, depending in a simple way on the remaining variables. 

A way which has been followed several times to obtain effective potentials for proteins is 
the statistical approacfP®. The input data is the distribution of residues-residues contacts 
between the different types of amino acids in a selected set of proteins. One has to solve 
an inverse statistical-mechanics problem, searching for the potential which generated during 
natural evolution the frequencies of contacts which are actually observed in the selected set of 
proteins, assuming a Boltzmann relation between contact frequency and contact energj^^i^. 

A variation of this approach is the calculation of contact energies based on the observed 
correlations between mutations in homologous proteins, using the same framework as that 
described in ref. |TT|for a different problem, namely that is of predicting the native conforma¬ 
tion of a protein from sequence information only. Here, pairs of residues which mutate in a 
correlated way in homologous sequence are regarded as in spatial contact, and from the full 
set of spatial contacts it could be possible to reconstruct the three-dimensional structure 
of several proteins. An inverse Ising-model formalism was used to subtract the effect of 
indirect correlations from the experimental data. 

The same formalism was then used in ref. [T2]to design an effective, non-portable two- 
body contact potential, assuming that the native conformation of the protein is known. 
This potential proved succesful in back-calculating residue-residue interactions in families 
of proteins generated by simulated evolution. It was also used to calculate the thermody¬ 
namic effect of mutations in four well-known proteins, giving correlation coefficients ranging 
between 0.65 and 0.89 between the experimental and the calculated AAG. 
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The formalism at the basis of refs. HD and [12] is meant to hnd the nnmerical valnes of the 


parameters of the effective energy 


i<j 


i 


( 1 ) 


from the knowledge of the observed frequencies /j(cr) of appearence of amino acid a at 
site i and of the observed correlations fij{a,T) obtained in a set of M aligned homologous 
sequences of length L. In Eq. Q, cTj is the type of residue at position i of the protein, 
A(|rj — Tjl) is a contact function which takes the value 1 if residues i and j are close in 
space (i.e., they contain a pair of heavy atom closer than a distance dj.) and zero otherwise, 
eij{ai,aj) is the interaction energy between residues cxj at position i and aj at position j, 
and hi{ai) is a one-body potential acting on each residue. 

Once the numerical parameters entering Eq. Q are calculated, the two-body energy 
^({^*}) = ~ol) t)e applied for describing the conformational space 

of the protein. In ref. [T21 for example, besides the calculation of mutational AAG, it was 
used to identify the frustrated regions of the protein. In doing so, the helds hj(cr) were 
regarded just as chemical potential meant to £x the average concentration of the twenty 
types of amino acids. Consequently, they were considered relevant only to control the 
underlying evolution of the set of homologous proteins, but not for the charcaterization 
of the conformational space of a well-dehned sequence, of hxed amino-acid composition. 
Hence, they were neglected in the caluclation of the AAG. 

However, one can think that the helds hi{a) contain not only a chamical potential, but also 
a real interaction contribution associated with the position of a specihc amino acid within the 
native conformation of the protein, not encoded in the two-body terms eij{a,T), and thus 
controlled by the rather than by the fij{a,T). This could be the case, for example, of 
the hydrophobic interaction, which depends, in hrst approximation, on the degree of burial 
of the ith site into the protein conformation, and not on the sum of two-body terms. 

In the present work we want to disentangle the contribution to the potential which can be 
interepreted as an interaction term, from the one which is purely a chemical potential. We 
show that evolution of protein sequences onto a (hxed) native conformation can be described 
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by an effective energy of the form 

= '^eij{(Ti,(Tj)A{\ri - rj\) + 
i<j 

+ + ( 2 ) 

i i 

where rji{ai) is the associated energy and /x(cTj) is the chemical potential. We regard the hrst 
two terms as an effective interaction potential 

U{{ri}) = - rj\) + (3) 

i<j i 

assigning a conformational dependence to its second term throngh a fnnction 0i({rj}) which 
measnres the solvent-exposnre of the ith residne. We show that this effective potential 
predicts the experimental AAG better than what the model involving only the two-body 
terms dicP. 


II. DERIVATION OF THE POTENTIAL 


Given an alignment of M homologons seqnences, the inpnt of the model is, as in the case 
of ref. [m the freqnency /i(cr) of the amino acid of type a at site i and the freqnency fij{(T, r) 
of the pair of types a and r at sites i and j, respectively, reweighted by the appropriate 
psendoconnt^ as 


/iM = 


-X 


X 




Me{x + y + z + 1) 

- + y 

Q 

1 


fi{a) + X— + 


Me{x + y + z + l) 


M, 


+X—+ 




kl 


Z 

iir 


( 4 ) 


where fi{cr) = and r) = Xls are the raw freqnencies. 

Tils is the nnmber of seqnences with similarity larger than 70%, q is the nnmber of residne 
types and Mg = J2s is an effective nnmber of seqnences. 

We shall search for a potential to generate a gobal distribntion p({cTi}) for residne types 
in all the positions of the alignment, that matches the empirical distribntions. In particnlar. 
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we shall require that 



Wk} 


i=l 





j 


Pij{T,P) = Y1 =fij(^^p)- 

Wk} 


(5) 


The quantity P{t) is the overall probability to hud amino acid of type r in any site, while 
AFj(r) is the different between the probability in a specihc site and the overall one, dehned 
in such a way to be uncorrelated to F(r). We also dehne the connected correlation function 
Cij (r, p) = fij (r, p) - fi (r) fj (p). 

Since we have no other knowledge of the potential but the frequencies dehned above, it 
seems reasonable to use the principle of maximum entropy with the constrains given by Eq. 
(|^ and the normalization condition of p{{ai}). Maximizing the entropy we obtain 




i<j 


1=1 


( 6 ) 



where the quantities eij{a,T), hi{a) and p(cr) are Langrange multipliers. Due to the formal 
similarity with Boltzmann’s distribution, we regard these quantities as effective energies. In 
particular, p is site-independent and we assign to it the meaning of chemical potential. 


Assuming that there are q types of amino acids, Eq. ([^ contains q + Lq + q^L{L — l)/2 
parameters. The experimental input of Eq. ([^ consists of (g — 1) + (L — l)(g — 1) + (g — 


1)^L(L—1)/2 independent equations. Consequently, one has l+(L+g—l) + (2g—1)L(L—1)/2 
free parameters which can be used to set the zeros of the energies. We must thus choose 
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some cr, a and a* such that 


/i(d) = 0 
hi{a) =0 Vi 

kr(a) =0 V a 

eij{a*,a) = eij{a,a*) = 0 


(7) 


In other words, one has to choose an amino-acid type a as the zero of the chemical potential, 
a type a as the zero for the held hi in each site (which in principle could be different from 
site to site), and a site i (the reference site) in which the held hj(cr) = 0 for any type of 
amino acid. 

For the purpose of determining the numerical values of the helds h and of the chemical 
potentials /r in Eq. ([^, we follow the spirit of ref. [TT| and write the argument of its 
exponential as an ehective energy 


L r 


Ua = eij{ai, aj) + ^ /i(cri) + hi{ai) - — 


hi{aj) 


( 8 ) 


i<j i=l j=l 

depending on the parameter a which controls the ratio between the two-body energy and 
the other energy terms. The associated Helmoltz free energy is 


7'„ = -ln(Z) = {W„)-S 


(9) 


where temperature is immaterial in this derivation and is set to 1. The Gibbs free energy, 
obtained by a Legendre transform over the independent variables, is 


9-1 


Ga -T "L ^ ^ /i((j) 


d[-\n{Z)] 


a=l 

i=i (7=1 dhi{a) 


( 10 ) 


in which the partial derivatives can be shown to be exactly T’(o') and APj(cr), respectively. 
Consequently, 

q-l L-l q-l ^ 

ga = J^a-Lj2 hi{a)APi{a). (11) 

cr=l 2=1 (T=l 

From Eq. (0 it follows that the vaules of the beds and of the chemical potentials can be 
obtained as 

= -lW(7) 
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hi{a) = 


OGa 


(13) 


9APi(cr) 

To find a manageable expression for Ga, this is expanded to the hrst order around a = 0, 
that is 


Ga — Go + 


dGr 


da 


a. 


(14) 


a=0 


In the zeroth-order term, the two-body energy does not appear because is proportional to 
a, while the thermal average [cf. Eq. ([^] of the other three terms of the effective potential 
[cf. Eq. (|^] cancel out with the last two terms of Eq. (0, leaving only the opposite of the 
entropy. Writing it in terms of the independent probabilities only, one obtains 

L-l q-l 

5o = EE Pi{a) ln[Pi(a)] + 

i=l a=l 


L-l 


E 

i=l 

q-l 

E 

(J=l 


Q-l 

l-^Pi(cr) In 

( 7=1 

L-l 


1 - 5 ^ Pi{a) 

a=l 


2=1 


In 


L-l 


LP(a)-Y,P,(a) 


2=1 


(15) 


q-l 


L-l 


+ 


X In 


{LP{a) - Y, P.W 


X 


a=l 


.cr, 


i=l 


In the second, third and fourth lines, the square brackets contains expressions for Pi{d), 
/^(cr) and P^ia), respectively, which are not independent from the other probabilities [cf. 
Eq. 


Remembering that P(c’') + APi{a) = Piia), the hrst-order term in Eq. (14) results 
identical to that of ref. m and can be written as 


dGa 


da 


a=0 




( 16 ) 


(j,T i<.j 


Inserting into Eqs. ([1^ and ([13|) the expression of Eqs. (|14|), ([15|) and ([16|), one obtains 

Pm.(<j) 


hmiu) = - In 


Pm.(a) 


+ In 


Lf)(5) 


-“EE 

T i\i^m 


( 17 ) 
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and 




Pm{cr) 

-Pm(d) 


- y X] r)P™(cT)Pi(r). 

r 27^m 


(18) 


On the other hand, since the the second term of Eq. ([^ can be written as the 

two-body interaction terms do not change with respect to ref. |TT] [cf. Eq. Q], resnlting in 


ey(CT,r) = -Cy (ct.t). 


( 19 ) 


For sake of simplicity, we shall write the potential which controls the Boltzmann proba¬ 
bility of Eq. ([§) as 

L L 

^ = 5^e,j(cTi,cTj)A(|ri - rj\) + + ^/i(ai) (20) 


i<j 


2 = 1 


2=1 


with T]i{a) = hi{ai) — L~^Ylij=ihi{o'j). The fnnction A(|rj — rj|), which is zero if |rj — 
Tjl > dr, is also inserted in the potential to rednce the noise in the calcnlation of the 
energy in the native conformation. In fact, pairs of residnes which do not interact directly 
whonld have Cjj = 0 dne to the procednre described above to snppress indirect correlations. 
Effects snch as the limited statistics of connts, or the approximation associated with the 
pertnrbative expansion of the potential conld resnlt in non-zero energies even in absence of 
direct correlations. Since we expect correlations to drop with the distance between residnes. 


we introdnce the A fnnction (the choice of dr is discnssed in detail in Sect. IV) to avoid 
spnrions effects. 


III. EFFECT OF THE MANY-BODY TERM ON THE PREDICTION OF 
THE EXPERIMENTAL AAG 


To test the validity of the potential dehned by Eq. (20) we shall calcnlate the energetic 


effect AAG of 308 point mntations on the stability of 14 proteins and compare them with 
the experimental valnes. 

The qnantity AAG is the change in the difference between the free energies of the de- 
natnred and of the native state of the protein npon mntation. To calculate this quantity 
we need therefore to dehne the free energy of the denatured state. We assume, as often 
done when interpreting experimental datsP^, that the mutation has no effect on the entropy 







of the chain, and that the interaction terms are zero in the denatnred state (cf. Eq. [^. 
Consequently, we shall make use of the interaction potential 


U({r,}) = ^ey(CTi,CT,)A(|r, - r,|) + ^ei({rj})i)i((Ti), 


( 21 ) 


i<j 


i=l 


where 0i({ri}) is some function of the coordinates of the protein which is 1 in the native 
conformation and zero in the denatured state. This function is not simply the sum of two- 


body terms (accounted by the hrst term of Eq. (21)), and consequently should be regarded 
as a many-body interaction. The chemical potential has been dropped because it plays 
no role in conhgurational space, in which the sequence {cxj} of the protein is hxed. The 
energetic effect of a point mutation is thus described by 


AAG'(ai ^ a') = 5^[ep(ai, aj) - ey(a', aj)]A(|ri - rj\)+ 




( 22 ) 


The protein-independent parameters of the model which gave the best results in terms of 
correlation coefficient between calculated and experimental AAG are dr = 4.0A, a = 0.15, 
X = 0.5, y = 0.1, 2 ; = 1.0 and the dehnition of the reference states as the most exposed 
site to the solvent occupied by polar or charged residues (for membrane proteins see below). 


The effect of variation of these parameters is described in Sect. IV 


For this study we chose a set of protein domains with at least 1000 homologs in the 
TEAM database, whose native structure is present in the PDB and on which the energetic 
effect of mutations has been characterized. This set is listed in Table [B The calculated 
values of AAG is plotted versus their experimental values in Fig. The overall correlation 
coefficient between predicted and experimental values, excluding 23 outliers, is r = 0.77. 
This should be compared with the value r = 0.47 obtained predicting the AAG making use 
of a potential including only the two body term eij, without the term rji (see Fig. SI in the 
Supplemental Material^^^ . 

A point is regarded as outlier if the difference between the calculated and experimental 
value is larger than 3 ct, where a is the error provided by the overall £t, also including the 
experimental error bars when available. Outliers can be classihed into three cathegories (see 
Table SI in the Supplemental MateriaP^. 10 of them correspond to sites which are highly 
conserved, and consequently there is little (or no) statistics for the mutated sequence; 2 
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outliers are in sites which were experimentally characterized as structured in the denatured 


state, thus invalidating Eq. (22). The remaining 11 outliers cannot be explained in a 
satisfactory way, or the denatured state of their protein is not precisely experimentally 
determined. 

The correlation coefficients between predicted and experimental data for each protein are 
displayed in Fig. [^and are compared with those obtained without the term rji (cf. Fig. S2 
in the Supplemental MaterialJ^ in which a detailed comparison of the AAG is shown or 
each protein). We can see that including the new term rji gives better correlation for most of 
the proteins (only IBVC slightly decreases from 0.81 to 0.79 and 2ABD from 0.87 to 0.82). 

In the set we have also a membrane protein (Bacteriorhodopsin, pdb entry 2BRD), for 
which this method is succesful in predicting AAG for 24 mutations, without any outlier. To 
obtain this result we used a different reference state i than for cytosolic proteins, namely 
the most exposed hydrophobic site. Not unexpectedly, using for bacteriorhodobsin the same 
reference state used for the other solution proteins (i.e., the most exposed polar/charged 
site) gave a poor correlation coefficient of 0.53. 


IV. ROLE OF THE PARAMETERS OF THE MODEL 

The model is defined by the values of dr, x, y, z, and by the choice of the reference states 
in Eq. ([^. Moreover, although the maximum-entropy principle is satisfied for a = 1, we 
found a better agreement with the experimental data for a < 1. Consequently, we regard a 
as a parameter of the model as well. 

The dependence on the correlation coefficient r between predicted and experimental AAG 
on the interaction range dr of the two-body term is displayed in Fig. [^for some of the pro¬ 
teins studied above (see also Fig. S3 in the Supplemental MateriaP^for the other proteins). 
For all proteins r is a decreasing function of dr, modulated by an oscillating behavior. Its 
maximum lies between 3 and 6A, depending on the protein. The period of oscillation, of 
about 3-4A, is compatible with the size of the shells of other residues interacting with each 
residue in the native conformation. The best choice for dr seems to be 4.0A, although small 
variations of this have little effect in the prediction of the AAG. 

The correlations coefficients r as a function of a are displayed in Fig. 1^ (cf. also S4 in 
the Supplemental Material^ ). Overall, they display a maximum at low values of a and 
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decrease when a approaches 1. In few cases, the maximum is exactly at a = 0, that is when 


the terms hj(cr) are decoupled from the terms r) [see Eq. (17)]. In the production 
calculations we chose a = 0.15, although small variations of a have little effect if kept small, 
that is in the range where the perturbation expansion of the Gibbs free energy holds. 

The coefficients x, y and z weight the pseudocounts, which are a priori probabilities 
meant to compensate the limited statistics in the alignments and make the correlation matrix 
invertibld^JJ^. These three parameters weight the pseudocounts which depend, respectively, 
on the overall fraction of residue types, on the overall fraction of residue types in the specihc 
position, and on the overall fraction of residue types in the specihc pair of positions. The 
dependence of r on these parameters is displayed in Fig. S5 in the Supplemental MateriaP^. 
For most of the proteins the best choice is x = 0.5, y = 0.1, z = 1.0. Anyway, the quality of 
the results depends mainly on while the choice of x and y seems not critical. 

While a natural and efficient choice for the reference state [see Eq. ([^] of the two-body 
term ejj(cr, r) are the gaps in the alignment!!^, that for the reference state of the terms hi{a) is 
not straightforward. For cytosolic proteins, a sensible choice seems to be to set the reference 
site at the position of the most exposed polar or charged residue. The degree of solvent- 
exposure of a residue is quantihed by the occupancy factor Sfact dehned in ref. [161 This 
choice assures that the many-body effective energy associated with the reference site does not 
change upon folding, since in the denatured state (0 = 0) the sidechain is approximately as 
exposed as it is in the native state (0 = 1). Suboptimal choices do not change dramatically 
the correlation coefficient, while the choice of hydrophobic sites signihcantly decreases it. 

Bacteriorhodopsin, which is a membrane protein, behaves in the opposite way. Good 
results are obtained using as reference the most exposed hydrophobic site, which worsen 
choosing more hydrophilic sites. 


V. PROPERTIES OF THE r^-TERM 

The term r]i{o') in the potential accounts for the contribution to the total energy which 
is not related to two-body interactions. As a result of the principle of maximum entropy, 
Eq. ([^ it is formally a one-body term of the potential, that is an external held. However, 
it is hard to justify an external hied in the present context, and consequently rji^a) must be 
regarded as the result of the combined ehect of the surrounding residues, that is a many- 
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body term. 


The average value of t] over all its occurences in the proteins of Table [T] for each type of 
amino acid is displayed in Fig. Except that for proline and tyrosine, the average of rj 
has a good correlation (r = 0.81) with the hydrophobicity of the corresponding residue, as 
measured by the scale of Kyte and Doolittl^. This fact suggests that rj represents, at least 
partially, the contribution of the solvent to the positioning of the amino acids in the native 
conformation of the proteins. In fact, it is known that effective interaction associated with 
the presence of the solvent are intrinsically many-bodj^. 

While it is not completely unexpected that proline escapes the linear correlation between 
rj and hydrophobicity, because of its peculiar, rigid chemical structure, the behavior of 
tyrosine is surprising. Anyway, it cannot be explained in terms of poor statistics, since 
tyrosine appears in the proteins studied above with a frequency comparable to that of the 
other residues. 

For the calculation of the AAG, the conformational dependence of the r^-term of the 
potential has been regarded as two-state, in the sense that the only needed property of the 


function 0i({rj}) in Eq. (21) was to be 1 in the native state and 0 in the denatured state. 
To extend the use of the effective potential U to characterize the conformational properties 
of a protein, one should define the full functional form of 0i({ri}). The correlation of 77 - 
term with the hydrophobicity of the corresponding amino acids suggests that a reasonable 
assumption for 0 i({ 7 ’j}) is the relative change in solvent exposure of the amino acid with 
respect to the native conformation, something which is indeed a many-body feature. 


VI. CONCLUSIONS 

While effective potentials based on ab initio calculations contain no more and no less than 
the physical terms which are used in the underlying calculations, statistical potentials have 
the virtue to summarize all possible physical effects, even unknown ones. As an example of 
their power, statistical potentialls do not distinguish between globular and membrane pro¬ 
teins. Moreover, their functional form is usually simpler, and then computationally cheaper, 
than other kinds of force helds. Thus, statistical potentials are potentially a powerful tool 
to study the properties of proteins. In particular, those obtained from the analysis of mu¬ 
tational correlations proved efficient in predicting the native conformation of protein^ and 
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the experimental AAd^^. 

In the present work we have shown that the prediction of experimental AAG can be 
further improved considering in the interaction potential a many-body term. This term 
arises naturally from a maximum-entropy principle, and can be parametrized within the 
same theoretical framework used for the two-body interaction term. It partially describes 
the effective interaction due to the solvent, but probably also other effects which cannot be 
reduced to a two-body interaction. As typical for statistical potentials, the choice of the 
reference state, that is the zero of the energy terms, plays a critical role in the correctness 
of the results. 
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Protein/Domain 

Pdb 

Family 

M 

^eff 

Mutat. 

BPTI 

IBPI 

00014 

4915 

1566 

3^121 

Myoglobin 

IBVC 

00042 

6000 

688 


FKBPl 

IFKJ 

00014 

16739 

2284 

2^211 

c-Src/SH3 dom. 

IFMK 

00018 

10749 

1542 

ITEZI 

Fibronectin/fnIII dom. 

IFNA 

00041 

17225 

8102 

2iE31 

PTP-BL/PDZ dom. 

IGMl 

00595 

26099 

2715 

2P1 

a-Lactalbumin 

IHMK 

00062 

1035 

119 


ecDHFR 

1RX4 

00186 

5237 

956 

2^ 

Staphiloc. nuclease 

ISTN 

00565 

4232 

1144 

3!P 

ACBP 

2ABD 

00887 

1677 

420 

2pl 

Bacteriorhodopsin 

2BRD 

01036 

3174 

208 

2m 

Dell-9-G129R-hPRL 

2Q98 

00103 

1608 

97 

m 

Tenascin/fnIII dom. 

2RB8 

00041 

17225 

8054 

2(m 

Azurin 

5AZU 

00127 

1467 

282 

iPi 


TABLE I; The list of protein domains, with the associated PDB structure the id of the 
BEAM family, the number M of sequences in the family, the number Mg// of effective 
sequences after reweighting for similarity and the number of mutations characterized 

experimentally. 
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FIG. 1: The values of AAG predicted by the model as a function of the corresponding 

experimental values. 
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FIG. 2: The correlation coefficient between predicted and experimental AAG for each 
protein. The red bars indicate the results obtained calculating the energies with the 
two-body term only, while the blue bars with the complete potential. The protein marked 

with an asterisk is a membrane protein. 
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Correlation 
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1BPI ^^IFMK ^^IHMK 2Q98 

FIG. 3: The correlation coefficient r as a function of the interaction range dr of the 

two-body energy term. 
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Correlation 



FIG. 4: The correlation coefficient r as a function of the perturbation coefficient a. 
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Correlation 



FIG. 5: The correlation coefficient r as a function of the choice of the reference state for 
hi{a) for IBPI. The color code indicates the degree of solvent exposure. The color scale 
goes from red (exposed) to green (buried). Residue K23 (K26 according to the numbering 

of the pdb) is selected as the reference state. 
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FIG. 6: The correlation between the average value of rj associated with each type of amino 
acid and its hydrophobicity, defined by the scale of Kyte and Doolittle. Excluding proline 

and tyrosine, the correlation coefficient is 0.81. 
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